我们介绍了Sparrow,这是一个寻求信息的对话代理,与提示的语言模型基线相比,训练有素,更有帮助,正确和无害。我们使用从人类反馈中的强化学习来培训我们的模型,以帮助人类评估者判断代理人的行为。首先,为了使我们的代理人更有帮助和无害,我们将良好对话的要求分解为代理人应遵循的自然语言规则,并分别向评估者询问每个规则。我们证明,这种崩溃使我们能够收集对代理行为的更多针对性的人类判断,并允许更有效的规则条件奖励模型。其次,我们的代理商在收集对模型声明的偏好判决时提供了支持事实主张的来源的证据。对于事实问题,麻雀提供的证据支持了78%的时间。比基线比基线更享受麻雀,同时对人类的对抗性探测更具弹性,在探测时只有8%的时间违反了我们的规则。最后,我们进行了广泛的分析,表明尽管我们的模型学会遵守我们的规则,但它可以表现出分布偏见。
translated by 谷歌翻译
大型语言模型会产生类似人类的文本,这些文本推动了越来越多的应用。但是,最近的文献以及越来越多的现实世界观察表明,这些模型可以产生有毒,有偏见,不真实或其他有害的语言。尽管正在进行评估语言模型危害的工作,但要远见卓识转换出可能出现的危害可能会引起严格的基准。为了促进这种翻译,我们概述了六种表征有害文本的方式,这些方法在设计新基准时值得明确考虑。然后,我们将这些特征用作镜头来识别现有基准中的趋势和差距。最后,我们将它们应用于视角API的案例研究,这是一种毒性分类器,被广泛用于HARS基准。我们的特征提供了一块桥梁,可以在远见和有效评估之间转化。
translated by 谷歌翻译
我们通过与与前面令牌的局部相似度,通过调节从大语料库检索的文档块来增强自动回归语言模型。尽管使用25美元\时分,我们的检索增强型变压器(RetroCro)的检索增强型变压器(RetroCr)对GPT-3和侏罗纪-1获得了可比性的性能。微调后,复古表演转换为下游知识密集型任务,如问题应答。复古结合了冷冻BERT猎犬,一种可微分的编码器和块状的横向机制,以预测基于数量级的令牌,而不是训练期间通常消耗的数量。我们通常从头开始训练复古,还可以快速改造预先接受的变压器,通过检索,仍然达到良好的性能。我们的工作通过以前所未有的规模开辟了通过显式内存改进语言模型的新途径。
translated by 谷歌翻译
本文旨在帮助构建与大规模语言模型(LMS)相关的风险景观。为了促进负责任的创新的进步,需要深入了解这些模型提出的潜在风险。详细分析了广泛的建立和预期的风险,借鉴了计算机科学,语言学和社会科学的多学科专业知识和文学。我们概述了六个具体风险领域:I.歧视,排除和毒性,II。信息危害,III。误导危害,V.恶意用途,V.人机互动危害,vi。自动化,访问和环境危害。第一个领域涉及陈规定型,不公平歧视,排他性规范,有毒语言和LMS社会群体的绩效。第二个重点侧重于私有数据泄漏或LMS正确推断敏感信息的风险。第三次解决贫困,虚假或误导性信息的风险,包括在敏感域中,以及敲门式风险,如共享信息的信任侵蚀。第四次考虑了试图使用LMS造成伤害的行动者的风险。第五部分侧重于用于支持与人类用户互动的会话代理的LLMS特异性的风险,包括不安全使用,操纵或欺骗。第六六探讨了对不同社会群体或社区可能产生不同影响的环境危害,工作自动化和其他挑战的风险。总的来说,我们审查了21个风险。我们讨论了不同风险的起源点和指向潜在的缓解方法。最后,我们讨论在实施减轻的组织职责,以及协作和参与的作用。我们强调了进一步研究的方向,特别是在扩展工具包时,用于评估和评估LMS中的概述风险。
translated by 谷歌翻译
The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.
translated by 谷歌翻译
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process. We then implement the proposed PQL in the considered HetNet and compare it with other distributed-training-and-execution (DTE) algorithms. Simulation results show that our proposed PQL can learn the desired power control policy from a dynamic environment where the locations of users change episodically and outperform existing DTE MADRL algorithms.
translated by 谷歌翻译
Cell-free massive MIMO is emerging as a promising technology for future wireless communication systems, which is expected to offer uniform coverage and high spectral efficiency compared to classical cellular systems. We study in this paper how cell-free massive MIMO can support federated edge learning. Taking advantage of the additive nature of the wireless multiple access channel, over-the-air computation is exploited, where the clients send their local updates simultaneously over the same communication resource. This approach, known as over-the-air federated learning (OTA-FL), is proven to alleviate the communication overhead of federated learning over wireless networks. Considering channel correlation and only imperfect channel state information available at the central server, we propose a practical implementation of OTA-FL over cell-free massive MIMO. The convergence of the proposed implementation is studied analytically and experimentally, confirming the benefits of cell-free massive MIMO for OTA-FL.
translated by 谷歌翻译
As an efficient graph analytical tool, graph neural networks (GNNs) have special properties that are particularly fit for the characteristics and requirements of wireless communications, exhibiting good potential for the advancement of next-generation wireless communications. This article aims to provide a comprehensive overview of the interplay between GNNs and wireless communications, including GNNs for wireless communications (GNN4Com) and wireless communications for GNNs (Com4GNN). In particular, we discuss GNN4Com based on how graphical models are constructed and introduce Com4GNN with corresponding incentives. We also highlight potential research directions to promote future research endeavors for GNNs in wireless communications.
translated by 谷歌翻译
Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.
translated by 谷歌翻译
The GLOM architecture proposed by Hinton [2021] is a recurrent neural network for parsing an image into a hierarchy of wholes and parts. When a part is ambiguous, GLOM assumes that the ambiguity can be resolved by allowing the part to make multi-modal predictions for the pose and identity of the whole to which it belongs and then using attention to similar predictions coming from other possibly ambiguous parts to settle on a common mode that is predicted by several different parts. In this study, we describe a highly simplified version of GLOM that allows us to assess the effectiveness of this way of dealing with ambiguity. Our results show that, with supervised training, GLOM is able to successfully form islands of very similar embedding vectors for all of the locations occupied by the same object and it is also robust to strong noise injections in the input and to out-of-distribution input transformations.
translated by 谷歌翻译